Multi Factor Stock Analysis Using Multilayer Perceptron

Background

Factor investing is a security selection approach that involves targeting specific elements or factors that appear to drive the returns of an asset. The premise behind Factor investing is that using quantitative methods, various risk premia can be identified isolated and utilized to the investors advantage. This idea is the corner stone behind almost all quantitative portfolio management methods. The impact factor investing it is head on the financial industry is profound, so much so, that we have seen the world’s largest asset manager (BlackRock) shift massive amounts of assets away from “old school fundamental managers” to purely quantitative strategies, and mandate that whatever fundamental managers remained use more data driven processes. Factor investing is not a new idea, but recent developments in computing and mathematics (machine learning) allow practitioners to approach factor investing from an entirely different angle.

Perspective:

The model makes the following underlying assumptions:

1) Predicting future returns is often considered a fool’s errand by professionals. The reason being is that there are too many factors that can impact the price of a given security at any given moment to account for accuracy. Furthermore, the impact and importance of these factors are constantly changing. Despite these challenges, most practitioners continue to try.

2) Selecting the stocks with the highest probability of a desired outcome based on common factors will lead to a more beneficial outcome than selecting stocks based on an expected (predicted) return, assuming accurate predictions are not sustainable over longer periods of time.

3) It is at least as important to avoid bad stocks as it is to invest in good ones. Selecting good stocks helps generate returns, avoiding bad stocks helps minimize losses.

The afore mentioned makes clear that there could be significant benefits to framing the question of security selection as a classification problem, as opposed to a regression problem. To solve such a problem, we propose the use of a multilayer perceptron to classify stocks into one of two categories (binary classification): 1) Above or equal to the median return of all stocks over the next year, or 2) below the median return over the next year.

Why MLP?

MLP vs Other Classification Models

Neural networks offer some advantages for this task over other supervised classification algorithms. First is the fact that MLPs have the ability to deal with nonlinear relationships. Second, traditional factor investing processes require that we identify and manipulate features on our own, letting us attack the problem directly using data. Lastly, neural networks are generally better at dealing with noisy data, a very common issue in finance.

MLP vs Other Neural Network Types - considerations

Panel Data

The data in question is panel data. Which is in a class of its own. Simply put, panel data is not time series data, and it is also not cross- sectional data. In fact, it is both. The multidimensional nature of panel data adds a clear layer of complexity in deciding what model to use. On one hand, the time series aspect of the data suggests RNN, or LSTM models would be better suited to deal with the data in question. On the other hand, the tabular nature of the data in fact suggests that the MLP may be better suited for the task. In the end given the fact that our objective was to classify and not predict price in the future the time element seemed secondary, and we made the choice to treat the data as cross-sectional data and ignore/minimize the effect of time in the data making the multilayer perceptron the best choice for the task as framed

Data

The model uses data from four different categories:

1) Accounting data – this is data taken directly from a company’s financial statements (income statement, balance sheet, cashflow statement). 2) Trading data – This includes data that is based on market activity over a given period of time. 3) Valuation data – This is generally market data normalized by accounting data 4) Technical Indicators – This includes moving averages for various windows based on the price of the stock.

Stage 1 importing data and feature engineering

1) Refinitive Eikon (API) – this is a subscription based financial data provider that provides access to thousands of corporate and financial data sets for companies and markets around the globe. 2) Yahoo finance (API) – within the scope of this model the yahoo finance API is used strictly for stock price history, and for the sake of convenience. 3) Datastream Webservices – this is a subscription based financial and economic data provider that also provides access to data sets for companies, countries, and markets around the world.

To limit the need for multiple calls and to minimize the use of local storage the raw data required from afore-mentioned sources were stored in an SQL database and is updated regularly from a machine with access to an Eikon Terminal that is required for access to two of the three sources. To view the code for the calls to I have provided a notebook titled Data Collection Five Factor. The functions below will import all the required data from the database. THIS CAN TAKE UP TO ONE HOUR TO COMPLETE do not run if unnecessary.

Stage 2 Data Exploration

The following data evaluation process has three purposes.

  1. To identify any existing relationships among features or between features and the label.
  2. To understand the distribution of the data sets
  3. To identify outliers and establish a need for any kind of transformation or standardization of the data.

Data Distribution Profile

The histogram charts below allow us to approximate the distribution of the various features fit into the model. Due to the asymmetric nature of many of the features and the difficulty of identifying observations in the tails of the distribution via standard distribution plots, the data below was transformed to a logarithmic scale. After the date it was transformed, the data is fraught with outliers.

Correlations

Below is a basic correlation matrix. The correlation matrix makes very clearly that for the most part correlations are weak (Close to zero) with pockets of stronger positive correlations amongst certain features (accounting features in particular). We can deduce the following from the correlation matrix:

  1. There is no feature with a strong linear relationship (positive or negative) to 1 year forward returns.
  2. If a relationship between the features and forward returns does in fact exist, traditional methods like OLS regression will not provide us with a strong model.
  3. There is likely some multicollinearity amongst features that will have to be dealt with either by feature elimination or L2 regularization.

Box plots

We are not only interested in the correlation between the features and 1 year forward returns but also in the distribution of such correlations. This allows us to better understand the impact any outliers may have on the data. The box plots below illustrate this distribution. The correlations are computed on a date-by-date basis for the entire cross-section of stocks. The plot shows that for the most part correlations revolve around zero, but can go as high as 0.7 at the higher end of 1 year vol and as low as -0.47 for a three month momentum. The key takeaway from this segment is that not only are correlations between features and one year forward returns low as demonstrated previously from the correlation matrix, but the dispersion within the correlation data of every feature to one-year returns is significant in many cases.

Effectively the data has forced us to make one of two conclusions:

  1. These data sets do not matter i.e., there is no meaningful relationship between any of the features and one year forward returns.

OR

  1. The relationship between the features and one year forward returns is not linear and are noisy, therefore it is very difficult to pick up using correlations.

To deal with this issue we turn to domain knowledge and based on said knowledge it is difficult to accept the idea that a collection of many of the most commonly used features for picking stocks would have nothing to do with stock returns. As such, option number two appears more likely.

Data Prep

Outlier Detection and Elimination

To eliminate outliers, we chose to use the isolation forest anomaly detection algorithm. The general idea behind the isolation forest algorithm is that when dealing with large amounts of data, it is easier to identify anomalies than it is to identify normal points. The algorithm effectively works like a decision tree that randomly partitions data and seeks outliers based on a random split between the maximum and minimum of the data. The logic here is that a random partition should usually occur closer to the root of the tree because outliers usually occur further away.

Label Creation

As previously discussed, the model's objective is to classify stocks into top 50% gainers and bottom 50% gainers. We decided on median because in addition to being a widely used measure of central tendency, it ensures a balanced training set regardless of the data's distribution. This will ensure that the model evaluates stocks on a relative basis. We believe this to be advantageous despite the potential misclassifications.

Train Test Split and Feature Scaling

Based on the isolation forest model used to detect outliers in our data, we have discarded about 22.5% of the original observations due to extreme observations. The remaining data must now be scaled and split into training and test sets. Due to the balanced nature of the data, and the relatively small number of observations, we decided to split it into 80% train and 20% test.

Model Selection

Tuning an MLP

We have used The Keras package to construct our model. We chose to build a sequential model with three hidden layers and multiple nodes, all of which use the 'relu' activation function, and the output layer with a single node that uses the sigmoid activation function. We use the binary cross entropy loss function and the adam optimizer. We also include dropouts after every hidden layer to help minimize potential overfitting and use l2 regularization to help with any effects we may experience because of multicollinearity. Lastly, we use the Keras Random Search tuner to identify the ideal number of nodes in each hidden layer, and the most effective learning rate for the optimizer. The tuner objective is set to minimize the loss function results for the validation (test) set.

Tuning results and Evaluation of Selected Model

Once the tuner has identified the best model, we retrain the model using the identified parameters in order to evaluate the model's performance over all iterations, based on the following metrics:

  1. Loss - (prediction error of the model)
  2. Binary accuracy - (how often the predicted label equals the true label)
  3. Precision - (how many true positives did the model select out of the total amount of times it returned a positive result)
  4. Recall - (how many true positives did the model select out of the total number of true positives in the data)

Evaluation of Results

Both the training and testing loss results decline steadily over the first 125 epochs. At this point the validations set begins to level off, and the model can no longer improve on the loss. By epoch 150 the model appears to be slightly overfitting (0.61 train vs 0.68 val) but this does not seem to be extreme or impact the other metrics at the moment. Binary accuracy steadily climbs for both the training and validation sets throughout the model, though the gradual climb begins to taper off around 0.62. While this may feel like a small number, context is important. If an algorithm that can successfully pick the top 50% of stocks with 90% accuracy existed, everyone would adopt it, and as people continued to adopt it, its accuracy and predictive power would decline. As such, the 62 - 65 accuracy range appears reasonable. The precision data is an indication that our model is doing a reasonable job avoiding bad stocks. The weakest number is in fact the recall, which shows that the model is having difficulty isolating good stocks, i.e., higher false negatives.

Implementation

We are now ready to put our model to use in a simulated environment. To do this, we first generate a pipeline that takes in raw data, creates features that are free of outliers and properly scaled, as well as a label based on the expected one-year return for each stock at each point in time, this data is then used to train our model, which in turn makes a classification for all stocks based on the next day’s data. Simply put, we are using 3 months of data to train our model and make one prediction based on that data. That data is then discarded, and a fresh set of data is used to train the next iteration

A word on regime change

Regime change in the context of the stock market is when "something changes" as the market no longer behaves the way it had previously. It is precisely the issue of regime change that makes stock selection particularly difficult to do successfully over extended periods of time. Regime change is always obvious in hindsight, and almost never visible in advance. Any model that searches for patterns in data can have a particularly difficult time classifying properly once stocks begin to behave differently than they had previously. To deal with this, our model is constructed to discard old data once it is used, and never re-introduce it into the models. This is also the reason this model is structured to "trade" quarterly.

Trading Engine

The trading engine is one of the more difficult parts of this project. It requires that we simulate both a market, and a financial services firm at the same time. This is done by taking predetrmined transactions for the list generated above, executing them using historical prices, and updating performance between dates. The process is also designed to ensure that enough funds exist for transactions, that additional lots are properly accounted for, that performance is continuosly calculated, and positions are known at all times, and no short selling is allowd. Though there are similarities accross models, each model has its own trading engine based on the guidlines established for the model.

Financial Evaluation

Portfolios usually evaluate it in both risk adjusted and absolute terms. Evaluating a portfolio in absolute terms is often as simple as looking at the performance overtime and comparing it to a peer or benchmark. Whereas evaluating a portfolio in risk-adjusted terms usually requires that the analyst “level the playing field” somehow (math is usually involved).

Performance Chart

In absolute terms the portfolio performance is okay for the most part. We do see periods of underperformance initially, but the model did a respectable job protecting the downside during the start of the COVID pandemic. In fact, at the market's trough, on March 23 of 2020, the market was down -24% from the start of the evaluation, while the model down only -15%. Since that time, the model has for the most part outperformed in absolute terms. The model did have some difficulty with regime change as the market went from growth to value in late Jan of 21 but recovered nicely after the next trading date.

Risk Adjusted Metrics

We use 5 major risk adjusted metrics to evaluate our portfolio

  1. The Sharpe Ratio - This is calculated as the average return less the risk-free rate (currently 0), normalized by the standard deviation of portfolio returns (a proxy for risk).
  2. The Sortino Ratio - Like the Sharpe ratio but uses a minimum accepted return (set to 0) and is normalized by downside deviation instead of standard deviation (a proxy for Bad risk).
  3. Treynor - Like Sharpe as well but is normalized by the OLS regression coefficient of the portfolios returns on the market (AKA Beta).
  4. Max Drawdown - a measure of risk that identifies the portfolio's worst peak to trough performance over the period
  5. Calmar Ratio - The portfolio's average return normalized by its max drawdown (absolute value of)

The risk adjusted metrics paint a much brighter picture for the model as ALL shows that the model outperforms the market nicely.

CAPM

The capital asset pricing model or CAPM is one of the most widely used models in finance. It is used to evaluate risk, calculate expected returns, and measure portfolio outperformance. It is interpreted as follows:

  1. An asset's/portfolio's expected returns is a linear function of its sensitivity to market variance.
  2. The asset's expected return calculates as: asset expected return = risk free rate + beta x (market return - risk free rate)
  3. Any returns in excess of the expected return are considered outperformance or underperformance, this is known as Alpha.
  4. Positive Alpha is the goal of every portfolio manager on earth.